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Abstract. 

A new approach to the understanding of the complex behavior of financial markets index 
using tools from thermodynamics and statistical physics is developed. Physical complexity, 
a magnitude rooted in the Kolmogorov-Chaitin theory is applied to binary sequences built 
up from real time series of financial markets indexes. The study is based on NASDAQ and 
Mexican IPC data. Different behaviors of this magnitude are shown when applied to the 
intervals of series placed before crashes and in intervals when no financial turbulence is 
observed. The connection between our results and The Efficient Market Hypothesis is 
discussed. 
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1 Introduction. 

In the last years financial markets have received a growing attention from general public. 
Weather storms are discussed with the same emphasis in journals, newspapers and TV 
news services than the financial ones and their endurance and consequences are analyzed 
by specialists in both fields. 

Physicists have started few years ago to investigate financial data since they are 
remarkably well-defined complex systems, continuously monitored down to time scale of 
seconds. Besides, almost every economic transaction is recorded, and an increasing 
amount of economic data is becoming accessible to the interested researchers. Hence, 
financial markets are extremely attractive for researchers aiming to understand complex 
systems. 

Several toy models of the markets behavior have been developed, highlighting among 
them the so-called Minority Game [1]. In that model, a magnitude resembling the real 
volatility (and called also volatility), have been extensively studied [2-9]. As we have 
proved in [10] there are more sensitive measures of the behavior of the model, which have 
their origin in statistical physics and thermodynamics. In further papers [11-12] we studied 
in more details one of these measures called physical complexity [13-15]. In [12] we 
proposed an ansatz of the type C(l) = l a for the average value of physical complexity 
taken over an ensemble of binary sequences. We also proved that the exponent a strongly 
depends on the parameters of the model. 

The aim of this work is to extend the study of this magnitude and his properties to the 
time series of real financial markets. As we will prove, the behavior of this measure of 



complexity drastically varies when applied to those intervals of the financial time series 
placed before the crashes and those where no financial turbulence was observed. We claim 
that this fact is close related with the Efficient Market Hypothesis (EMH). 
The structure of this paper is as follows: In Sec. 2 we discuss the way in which a real time 
series is encoded in a binary string. We also briefly develop the theoretical tools used in 
our analysis. In Sec. 3 we expose our results concerning the behavior of the physical 
complexity and in Sec. 4 we discuss our result in relation with the most accepted paradigm 
of financial market behavior. 

2 The binary string associated to the real time series. 

One could conceive a financial market as a set of ./V agents each of them taking a binary 
decision every time step. This is an extremely crude representation, but capture the 
essential feature that decision could be coded by binary symbols (buy = , sell = 1 , for 
example). Although the extreme simplification^, the above setup allow a "stylized" 
definition of price. 

Let N' , N[ be the number of agents taking the decision , 1 respectively at the time t . 
Obviously, N = N' + N[ for every t . Then, with the above definition of the binary code 
the price can be defined as: 

p'=f — 
N' 

where / is an increasing and convex function which also hold that: 
a) /(0) = 



b) lim/W = 



oo 



c) lim/ w = o 



The above definition perfectly agree with the common believe about how offer and 
demand work. If N' is small and N[ large, then there are few agents willing to buy and a 

lot of agents willing to sale, hence the price should be low. If on the contrary, N' is large 

and N[ is small, then there are a lot of agents willing to buy and just few agents willing to 
sale, hence the price should be high. Notice that the winning choice is related with the 
minority choice. A complete study about the suitability of the properties of the above 
function / and their consequence will appear in [17]. 

We exploit the above analogy to construct a binary time series associated to each real time 
series of financial markets. The procedure is as follows: Let {p, } teN be the original real 

time series. Then we construct a binary time series {a, } teN by the rule: 



A similar technique was originally introduced in the context of natural languages [18] and 
more recently in the study of cross-correlation between different market places [19]. To 
the outgoing binary series we apply the theoretical tools, which we describe below. 
Physical complexity (first studied in [13] and [14]) is defined as the number of binary 
digits that are explainable (or meaningful) with respect to the environment in a string 77 . 
In reference to our problem the only physical record one gets is the binary string built up 




1 P t - Pi 
p, < p : 



t-i 



t-i 



Because among the other reasons, often the agents do not participate and prefer wait 



from the original real time series and we consider it as the environment £ . We study the 
physical complexity of substrings of e . The comprehension of their complex features has 
high practical importance. The amount of data agents take into account in order to 
elaborate their choice is finite and of short range [20]. For every time step t the binary 
digits a,_, , a t _ M a,_ x represent in some sense the winning choices made by the agents 

in the corresponding instant of time. Therefore, the binary strings a t _ l a t _ M ---a t _ 1 carry 

some information about the behavior of agents. Hence, the complexity of these finite 
strings is a measure of how complex information agents face. We study the complexity of 
statistical ensembles of these substrings for several values of / . 

We briefly review some measures devoted to analyze the complexity of binary strings. The 
Kolmogorov - Chaitin complexity [21], [22] is defined as the length of the shortest 
program % producing the sequence r\ when run on universal Turing machine T: 



where \%\ represent the length of % in bits, T(n) the result of running % on Turing 
machine T and ^(77) the Kolmogorov-Chaitin complexity of sequence % . In the 
framework of this theory, a string is said to be regular if K(r\) < \r\\ . It means that 77 can 
be described by a program K with length smaller than 77 length. 

As we have said, the interpretation of a string should be done in the framework of an 
environment. Hence, let imagine a Turing machine that takes the string e as input. We 
can define the conditional complexity K(r\ I e) [13-15] as the length of the smallest 
program that computes 77 in a Turing machine having £ as input: 




(1) 



for a more clear position of the market [16]. 



K(j] I £) = min {H :t] = C t (ll, £)} (2) 
We want to stress that ^(77 / e) represents those bits in 77 that are random with respect 
to e [13]. Finally, the physical complexity can be defined as the number of bits that are 
meaningful in 77 with respect to £ : 

K(r t :£) = \r 1 \-K(ri/£) (3) 
Notice that |r/| also represent (see [13-15]) the unconditional complexity of string 77 i.e., 
the value of complexity if the input would be e = . Of course, the measure ^(77 : e) as 
defined in Eq. 3 has few practical applications, mainly because it is impossible to know the 
way in which information about e is encoded in 77 . However (as shown in [15] and 
references therein), if a statistical ensemble of strings is available to us, then the 
determination of complexity becomes an exercise in information theory. It can be proved 

that the average values C(|t7|) of the physical complexity K(r]:£) taken over an ensemble 
£ of strings of length I77I can be approximated by: 

C(\rj() = (K(.Ti:e)) 1 .=\4-K(I./e) (4) 

where: 

K(L /£) = -£ p{r] I £) log 2 p(77 / £) (5) 

and the sum is taking over all the strings 77 in the ensemble £ . In a population of ./V 

n(r]) 

strings in environment e , the quantity — , where n(s) denotes the number of strings 
equal to 77 in E , approximates ^(77 1 £) as N — > 00 . 



Let £ = {a,} leN and / a positive integer, l>2. Let L l the ensemble of sequences of 
length / built up by a moving window of length / i.e., if 77 e E ; then 77 = a t a i+1 - • • a M _ x 
for some value of i . We calculate the values of C(l) using this kind of ensemble L, . The 
selection of strings £ that we do in Sec. 3 is related to periods before crashes and in 
contrast, period with low uncertainty in the market. 
3 The study of real time series. 

We have used the daily closing value of the NASDAQ composite in the time period from 
January 3 1995 to April 18 2000 and also the daily closing value of IPC, the leader index 
of Mexican Stock Exchange (BMV) in the time period from January 2 1991 to March 27 
2000. The evolution of both indices can be seen in Fig. 1. We select for both indices 
several time intervals with the following characteristics: 

a) Intervals just before the crashes: the initial point is selected after the onset of the 
bubble and the last point is that of the all-time high of the index. 

b) Intervals where no financial turbulence is observed. 

For the NASDAQ we select three periods: from October 13 1995 to May 14 1997, from 
December 14 1998 to October 28 1999 and from October 28 1999 to February 24 2000. 
We labeled these periods as Nl, N2, N3 respectively. In the period Nl no financial 
turbulence was observed, N2 correspond to some important season of the Microsoft trial 
and N3 is the just previous interval to the crash of April 2000 when NASDAQ loss about 
25%. 

For the IPC we also select three periods: from January 6 1994 to March 6 1995, from 
January 9 1996 to August 13 1997 and from August 13 1997 to October 25 1999. We 



labeled these periods as II, 12, 13. The period Nl was disastrous for Mexican financial 
market due to the crisis brought about by the presidential transition (2) . In the period N2 no 
financial uncertainty was observed in economy and period N3 was highly turbulent due to 
the Asian crisis. We would like to stress the difference between Nl (endogenous origin of 
the crisis) and N3 (exogenous origin of the crisis), 
a) 
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This was called "tequilazo". 



b) 




b) 




Fig. 2: Values of C(/) vs. I for the intervals of interest (see text). In Fig. 2a we have Il(*), I2=(o) and 
I3=(x). In fig. 2b we have Nl=(o), N2=(x), N3=(*). 



We calculate C(l) for the binary sequences associated to the above-mentioned intervals. 
The results appear in Fig. 2. Notice that the values of the function C(/) corresponding to 
those intervals where no disturbance is observed (Nl and 12) are lower than those placed 
just before the crashes (N2, N3, II and 13). It means that sequences corresponding to 
critical periods have more binary digits meaningful with respect to the whole series than 
those sequences corresponding to periods where nothing happened. If we compare the 
Fig. 2b with the Fig. 1 of [10] we conclude that Nl behave almost as a random sequence. 



In the paper [12] we proposed an ansatz of the type C{l) = 8l a . The corresponding 
values of 8 and a for the sequences Nl, 13 appear in the following Table: 



Table I 

Values de 8 and a for the selected sequences. 



Sequence a 8 

Nl 2.1059 0.0393 

N2 3.2458 0.0014 

N3 3.6715 0.0063 

11 3.7748 0.0003 

12 3.2213 0.0013 

13 3.6901 0.0004 



The larger exponents correspond to sequences with high financial turbulence. Notice that 
shorter exponents are related with the more random sequences. 
4 Conclusions. 

The results of the last Section suggest that the intervals where the markets work fine 
produce binary sequences with features close to the random ones, meanwhile intervals 
with high uncertainty produce binary sequences, which carry more information. 
The above is in agreement with the EMH [23], which stated that the markets are highly 
efficient in the determination of the most rational price of the traded assets. We conclude 



that a measure of how close is a markets to the ideal situation described by the EMH is the 
exponent a . The lower exponent, the more random sequence. 

More surprisingly is the fact that intervals with high financial turbulence have high 
informational contents. It open the challenges of understand what kind of information the 
sequences bear and how it would be used to predict the crashes. 
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